Skip to content

Conversation

@ergunsh
Copy link
Contributor

@ergunsh ergunsh commented Jan 21, 2026

This PR completes the telemetry system by implementing the transport layer for ClearcutSender. It enables actual HTTP communication with the Clearcut backend, handling event batching, rate limiting, and reliable delivery, including robust shutdown handling.

Key Changes:

  • HTTP Transport: Implemented fetch-based transport sending POST requests to the Clearcut HTTP server.
  • Event Batching: Events are now buffered and flushed periodically (default: 15 minutes) or on shutdown.
  • Reliability & Rate Limiting:
    • Server-Side Backoff: Respects next_request_wait_millis from server responses to handle rate limiting dynamically.
    • Transient Error Retries: Failed requests (5xx, 429) result in events being requeued for the next flush.
    • Request Timeouts: Enforced 30s timeout on requests to prevent hanging processes.
    • Session Rotation: Automatically rotates session IDs every 24 hours.
  • Safety & Stability:
    • Buffer Overflow Protection: Caps the buffer at 1000 events to prevent memory leaks, dropping oldest events if necessary.
    • Optimistic Removal: Prevents race conditions and duplicate events during shutdown by optimistically removing events from the buffer before sending.
  • Testing Improvements:
    • E2E Robustness: Updated E2E tests to use a mock web server instead of relying on the logger to log specific lines.

Implementation Roadmap:
These changes finalize the planned telemetry architecture:

  1. CLI & Opt-out Mechanism (Merged)
  2. Logger Scaffolding & Integration (Merged)
  3. Persistence Layer (Merged)
  4. Watchdog Process Architecture (Merged)
  5. Transport, Batching & Retries (This PR):
    • Finalized ClearcutSender with HTTP transport, batching, and server-directed backoff strategies.

@ergunsh ergunsh assigned ergunsh and OrKoN and unassigned ergunsh Jan 21, 2026
@OrKoN OrKoN self-requested a review January 21, 2026 14:19
@OrKoN OrKoN removed their assignment Jan 21, 2026
Implements the telemetry transport layer per the design doc:
- Ring buffer with overflow handling (1000 events max)
- HTTP transport using native fetch API with 30s timeout
- Daisy-chain flush scheduling (15min default interval)
- Transient vs permanent error classification (5xx/429 retry, 4xx drop)
- Server-side rate limiting via next_request_wait_millis
- Shutdown handling with 5s timeout for final flush
@OrKoN OrKoN force-pushed the telemetry/fetch-transport-05 branch from 356657c to f0e8a8f Compare January 27, 2026 14:02
@OrKoN OrKoN changed the title chore: Implement ClearcutSender HTTP transport for telemetry chore: Implement ClearcutSender HTTP transport for telemetry disabled by default Jan 27, 2026
@OrKoN OrKoN enabled auto-merge January 27, 2026 14:18
@OrKoN OrKoN added this pull request to the merge queue Jan 27, 2026
Merged via the queue into main with commit 210bacd Jan 27, 2026
21 checks passed
@OrKoN OrKoN deleted the telemetry/fetch-transport-05 branch January 27, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants